# Q-ary LDPC codes based on RA structure for optimization of optical fibre communication system

LI YINGYING<sup>1</sup>, WANG SUNAN<sup>1,2</sup>, WANG YONGXUE<sup>1</sup>

Abstract. In order to design a better optical fiber communication system, a multivariate LDPC encoding system based on RA structure is proposed. As a kind of efficient error correcting code, low density parity check code (LDPC) has good performance. It overcomes the shortcomings of other error correcting codes and is the nearest channel coding found from the Shannon limit. Since it was rediscovered, the LDPC code has received a lot of attention. It is another hotspot in the field of channel coding after Turbo code, and has become the preferred coding scheme for the fourth-generation mobile communications. The main research work is to apply the multiple LDPC codes to the optical fiber communication system on the basis of the four-bit interleaved LDPC coded modulation (BI-LDPC-CM) scheme of binary LDPC codes. So that it can meet the growing demand for high capacity transmission of fiber links. In the aspect of hardware implementation, the encoding and decoding of quaternion LDPC code is realized by FPGA. The experimental results show that the construction method of the RA code structure can improve the calculation of the product of the information bits of all lines and the non-zero elements in the check matrix.

Key words. RA structure, q-ary LDPC code, FPGA, Max-log-BP decoding algorithm.

#### 1. Introduction

Low Density Parity Check Code (LDPC) code is a kind of efficient error-correcting code with good performance, which overcomes the shortcomings of other error-correcting codes. It is the channel code found at present and the closest to Shannon's limit [1]. Since the rediscovered LDPC codes have been widely concerned, it is another hotspot in the field of Turbo code channel coding, which becomes the preferred coding scheme of the fourth generation mobile communication. At present, the binary LDPC code has been widely used in a variety of digital communication systems. However, with the same parameters, the Tanner graphs of the q-ary LDPC code are sparser than that of the binary LDPC code, and its girth is larger than the binary

<sup>&</sup>lt;sup>1</sup>Shenzhen Polytechnic, 518000, China

<sup>&</sup>lt;sup>2</sup>Corresponding author

LDPC code. This feature can facilitate the optimization design of LDPC codes and reduce the effects of the short-loop and the stop-set on the convergence of the decoding [2].

Therefore, the q-ary LDPC codes can design good codes with a lower error leveling and better error correction ability to meet the high-speed transmission needs of the communication system. We focus on the design of the q-ary LDPC codes based on RA structure and the FPGA hardware implementation based on Max-log-BP decoding algorithm. From the view of performance, speed and resource consumption of codec, it can meet the general communication requirements. The application of this hardware implementation in the underwater acoustic communication network can realize the high-speed and efficient transmission of information. Combined with radio technology, the omnidirectional and three-dimensional communication can be established in the ocean, which can help people to observe and develop the ocean.

## 2. Literature review

At present the binary LDPC code has been widely used in the optical fibre communication system [3]. However, with the same parameters, the Tanner graphs of the q-ary LDPC code are sparser than that of the binary LDPC code, and its girth is larger than the binary LDPC code. This feature can facilitate the optimization design of LDPC codes and reduce the influence of the short-loop and stop-set on the convergence of the decoding [4]. Therefore, the q-ary LDPC codes can design good codes with a lower error leveling and better error correction ability to meet the high-speed transmission needs of the communication system.

In his paper in 1962, Dr. Gallager proposed the idea of LDPC coding, which is a kind of linear block code, and its special feature is that its check matrix is very sparse [5]. According to the idea of Gallager, when the LDPC code word is constructed by using the check matrix, the generated code word can satisfy the constraint condition of linear, and each symbol contained in the generated code word satisfies certain constraints. In the iterative decoding theory and some other theory aspects, Gallager proposed LDPC code, which far surpasses the Turbo code. But at the time, many researchers were busy with BCH code, Reed-Solomon codes and concatenated codes, and under the impact of the speed of the encoding storage and decoding computation, except for a few researchers, LDPC codes were not able to attract people's enough attention, and it was even forgotten. Until 1996, MacKay and Neal and others found that when they used the optimal decoder to decode, the LDPC proposed by Gallager LDPC was a good code with a lower linear decoding complexity [6], and more and more researchers will pay attention to the LDPC code.

Tanner extended the Gallager construction, introduced the binary graph model [7] into the LDPC code, and proposed the Tanner graph. Tanner graph can show the connection between the code word symbols and its constraints. When the constraints are all binary checksum, you can get Gallager code. In addition to this research, Tanner improved the algorithm idea of Gallager, the decoders used in the process of the implementation of the iterative decoding are independent each other. In terms of LDPC code constraint, the check constraint is improved as the linear constraints,

which makes the LDPC code be widely used. Therefore, Tanner has great contribution to the LDPC code. Wiberg re-studied on the basis of the Tanner graphs, and proposed the "minimum sum" and "sum-product" algorithms. Richardson and Luby studied the threshold effect of LDPC code. If the block length is infinitely close to the infinite value, and the noise interference is very small, below the threshold, then the system can make the bit error rate arbitrarily small. When the noise level is greater than this threshold, the bit error probability is greater than a positive constant. Luby used the threshold to analyse, and firstly studied the irregular structure of LDPC code, the weight of each row and each column of the check matrix for is not a constant, and the "Density Evolution" iterative algorithm is used to build irregular LDPC code [8].

Domestic research is also very rich, and the main contributions are as follows. Hu Xiaoyu (School of Computer Science and Technology, Peking University, Beijing 100871, China) proposed a new PEG algorithm that can maximize the local loops of variable nodes [9]. Yu Kou and Shu Lin propose an algorithm of point and line based on Euclidean space and projection geometry on finite fields to design LDPC codes, which have achieved good performance. In addition, in the hardware implementation, the most is the research based on binary LDPC code research, but the q-ary LDPC code involved little. Davey and MacKay promoted the binary binary LDPC code to the q-ary LDPC [10]. In a multidimensional domain, the message of q-ary LDPC code is not a single one-bit symbol, and it consists of multiple components [11]. For the non-binary LDPC code, its coding complexity is increased, the calibration information is also complex, but the decoding process of non-binary LDPC code is relatively simple, and the decoding algorithm is different from the binary LDPC code decoding method, and we need to pay attention on it [12].

Now, people mainly explore deeply the LDPC code from the following two levels. First, in order to improve performance, they usually use large code length binary LDPC code to approach Shannon limit; Second, it do not need to use long code length on the study of q-ary LDPC code, and mainly concentrated in the small code length [13]. Regardless of binary or q-ary LDPC codes, most researches focus on the construction of parity check matrix, exploration of new coding and decoding algorithms, and analysis of the performance of codes, so as to lay a solid theoretical basis for the use of LDPC codes in time production. This paper is the study of q-ary LDPC codes.

### 2.1. Design of FPGA encoder based on RA structure

The code word vector of LDPC code consists of information bits and check bits. The encoding process of the LDPC fast encoding algorithm based on RA structure is very simple, and its implementation is mainly divided into two parts. First, you need to calculate the product of the non-zero elements in the check matrix H and the information bits, and use the idea of iteration to compute the parity bit, and then combine the information bits with the parity bit, which forms the desired code word. In the specific implementation, the form of the look-up table is used to store the column address of the non-zero element of parity matrix H in the ROM, which

can simplify the calculation of parity bit. The entire encoder consists of control module, input module, coding module and output module.

Under the control of the clock and reset signal and combined with the counter, the control module provides the control signal to several other input, coding and output modules through the conversion between the states, including several important control signals: the read write signal of the ROM and RAM and the enable signal of each module. The block diagram of the control module is shown in Fig. 1.



Fig. 1. Design diagram of control system

The role of the input module is that use the ping-pong operation to complete the buffer on the input data, and then extract the corresponding address with the address output by ROM look-up table, and output the prepared data that calculates the parity bit to the encoding module. As a result of the construction method of the RA code structure, the calculation of the product of the information bits of all lines and the non-zero elements in the check matrix becomes the key of this method, that is, calculate  $r_i = \sum_{j=1}^k h_{ij} \cdot m_j$ ,  $i = 1, 2, \dots, 486$ . The advantage of putting the computation of  $r_i$  in the input module is that it reduces the computational complexity of the parity bits in the subsequent encoding module.

Therefore, the calculation of value  $r_i$  becomes a critical step in the input module. The specific implementation plan is as follows: The RAM group address storing the storage information element  $m_j$  is calculated, and the address is stored in the read-only one-port ROM. Then, the non-zero element value in the H matrix corresponding to the calculation  $r_i$  is stored in the other lookup table ROM. Finally, in the cooperation of the clock, the information element  $m_1, m_2, ..., m_k$  and the corresponding  $m_1, m_2, ..., m_k$  are taken out in turn, and the product of  $m_1, m_2, ..., m_k$  and the corresponding  $h_1, h_2, ..., h_k$  are calculated. The result is accumulated, and the cumulative result is  $r_i$ .

The main function of the coding module is the calculation of parity bit. The intermediate variable  $r_i$  output by the input module is transformed to the encoding module to calculate the relevant parity bit  $p_i$  and the temporary information bit  $m_i$ , which is sent to the output module after the merger. The calculation of the parity bit main use the recursive idea. The specific realization is completed by the multiply-add unit, counter, delay register, and multi-select data selector. The calculation of each specific parity bit is done by the multiply-add unit. The counter is used to generate the chip select signal of the data selector, and the data selector selects the specified parity bit to control the output of the data. The delay register

can generate the delay signal  $p_{i-j}$  to achieve the delay function of each parity bit. The specific block diagram of the coding module is shown in Fig. 2.



Fig. 2. Structure Block diagram of the coding module

The main function of the output module is to synthesize the final code word, and buffer and output the code word. The module combines the parity bit information obtained by the coding module with the information bit of the buffer and then separates the parity bit from the information bit through a substitution table consisting of a read only single-port ROM whose address is the corresponding parity bit and information bit, and finally double-buffered ping-pong technology is used to read out to the received parity bit and information bits, and then get the code word output [14].

When the encoder is implemented with EP3C55F780C8 model FPGA, the logic unit consumed is 80 %, and the register unit occupies 64 % of the total resource. At this point, the combined clock frequency is 46.34 MHz. Since the quaternion equation is used, each codeword requires two bits, so the data rate is  $46.34 \times 2 = 92.68$  MHz. Compared with binary LDPC codes, the hardware resources consumed by FPGAs are relatively large when FPGA is used to implement LDPC codes. In addition, the coding complexity and coding delay are increased.

# 3. Design of FPGA Decoder Based on Max-log-BP Algorithm

One of the difficulties in the design of LDPC code decoders is that the transmission of information can be effectively carried out on the basis of iteration, so that the decoder can be divided into three kinds of decoding structures: full serial, full parallel and partly parallel.

Full serial architecture: The full serial structure decoder mainly consists of check node units (CNU), variable node units (VNU), control units and intermediate information RAM. The operation of the serial decoder is: The control unit controls the CNU and VNU to alternately complete the operation between the variable node and the check node. The function of each unit is: The CNU updates the check node

information, and VNU updates the variable node information. The intermediate information RAM is used to store the intermediate results of the CNU and VNU iterative operations. The control unit can coordinate the working sequence of CNU and VNU and control the iterative decoding process.

When using the full serial decoding structure, because CNU and VNU are working alternately, it is possible to avoid the access violation of CNU and VNU to intermediate information RAM by designing a multiplexing module to arrange the access reasonably. Because of the alternating work of CNU and VNU, this causes the calculation of the variable nodes to be waited when calculating the check node. In this way, the full-serial decoder can avoid the occurrence of access conflict and consumption of hardware resources at least, but the processing speed is slow. The full-serial decoder is suitable for hardware resource requirements, but the speed of data processing is not demanding.

Full parallel structure: The form of the full parallel structure is very similar to the bipartite graph, containing m CNUs and n VNUs. In the decoder of fully parallel architecture, both the check node and the variable node have their respective independent computing units. In the iteration, it does not have to wait for the calculation results of other units. In the previous clock cycle, all the check nodes can be updated, and the last clock cycle can complete the update of all variable nodes. Therefore, the intermediate information generated by the iterative computation is not required to be stored like a full serial structure decoder. This enables the decoder of the structure to have high speed of operation and can basically perform real-time decoding. However, the number of computing units in the full parallel architecture decoder is too large, and the resource consumption is very large. The critical path length is increased when routing is adopted, which makes the utilization of the chip very low and is not suitable for long format data format.

Partially parallel structure: Logical resources are important in the design of FPGA. Most designs are expected to be implemented with minimal logical resources in order to reduce costs. The full serial structure of the decoder occupies less resources, because it uses the idea of logical multiplexing. Through the repeated use of CNU and VNU, the consumption of resources is reduced. In combination with the structure of full parallel decoding and the idea of multiplexing, a partially parallel structure is used to decode multivariate LDPC codes. The design of FPGA pays attention to the idea of module reuse, and this part parallel design can fully reflect this idea. Structurally, CNU and VNU between partial parallel decoders are cached through a RAM storage array of dual ports. Compared with the decoder with full parallel structure, the number of CNU and VNU is different, and the decoder of partial parallel structure introduces the problem of parallelism.

For example, it is assumed that the row number of the check matrix H is M, and the column number is N. The ranks of the parallel degrees, respectively, are p and q, and they are divisors of m and n. The decoder has Q VNU arithmetic units and P CNU arithmetic units. Each VNU arithmetic unit is responsible for updating the calculated N/q column variable nodes, and each CNU arithmetic unit is responsible for updating the calculated M/p line check nodes. Therefore, it takes N/q times to complete the full update of the variable node. That is, in N/q clock cycles, the N

column variables are updated in chronological order. Furthermore, Q variable nodes can be calculated in each clock cycle, and it takes N/q clock cycles to complete the update calculation of all variable nodes. The updating process of the test node is similar, and there is no more description here. If the resource consumption situation of FPGA is concerned, when the decoder uses full parallel mode, the logical units occupied by the check node and the variable node are  $C_1$  and  $C_2$ , respectively. Then, when the structure of the decoder is changed to partially parallel, the entire decoding process requires the resources to be  $C = (C_1/p) + (C_2/q)$ . At the same time, the consumption of hardware resources is reduced by changing the structure of the decoder. Of course, the decoding time of this partial parallel structure is longer. Therefore, in the selection of structures, we should give full consideration to the balance of speed and area, in order to meet the requirements of the design system.

The overall design of the decoder block diagram is shown in Fig. 3. The modular structure runs through the entire decoder design process. It can be seen from the block diagram that the decoder mainly is composed of the input buffer module, CNU module, VNU module, external information storage module, output buffer module, decode decision output module and the control unit module. With the cooperation of the clock, the work of the whole decoder is controlled by the control unit module. First, the initial information enters into the input buffer unit after quantization, and completes the serial conversion, and then the decoder enters working condition. Next, the check node and the variable node are updated, and the CNU and VNU exchange information through the external information storage module. Finally, after the output buffer module completes the parallel-serial conversion, the data enters the decoding decision module to judge the obtained code word C satisfies  $C \cdot HT = 0$ , or reach the maximum decoding iterations, so as to get the final output code word sequence.



Fig. 3. Overall design diagram of decoder

In the LOG operation domain, each variable point of the q-ary LDPC code has q-1 initial information, and the binary LDPC code has only one. Since each variable point of the quaternary LDPC code has 3 channel initial information  $L(v) = L(v_1, v_2, v_3)$ , the operation amount of the decoder becomes larger and occupies more logic resources. However, in the whole decoding process, the information vector of each variable point is also involved in the operation, so  $L(v) = L(v_1, v_2, v_3)$  can be regarded as a whole to store and transfer, and  $L(v) = L(v_1, v_2, v_3)$  can be separated

only in the operation of the check node unit and the variable node unit. CNU module achieves the relevant Field word operation [15]. The Field word calculation formula is not introduced too much here.

The load balancing of reducing ends with different inclination is shown in Fig. 4. In this figure,  $Lx_1$ ,  $Ly_1$ ,  $Lx_2$ ,  $Ly_2$ ,  $Lx_3$  and  $Ly_3$  are the input vectors of two variable points for each Field word operation. CNU design part must have two functions. One of the functions is to complete the Field word operation, and the other determines that the output results meet the calibration equation. Only when the CNU module calculates that the result of each checkpoint is 0, it indicates that the calculated code word is correct, otherwise the code word is incorrect, iterating again until the correct code word is reached or the maximum iterations is reached.



Fig. 4. Load balancing of reducing ends with different inclination

In the design of the VNU module, the calculation of the formula is mainly completed. Symbol L(qmn) denotes the external information that the variable node passes to the check node with respect to the value of the variable node; pn denotes the initialization iteration information; L(rm'n) denotes that the check node passes to the variable node with respect to the log-likelihood information required the check equation when the value of the variable node Isai. Symbol  $L_{\rho}(m,n,l)$  denotes the degree distribution function of the check node. Taking the quaternary LDPC code as an example, the function of the VNU module is to compute the information vector, and the information vector is divided into three sub information vectors. Symbol L(v) denotes the channel initial information of each variable. Symbols L(A), L(B), L(C) and L(D) are the information of four variable nodes respectively.

### 4. Conclusion

The decoding complexity of LDPC codes is lower and the performance is closer to the Shannon limit. In contrast to the binary LDPC code, the multivariate LDPC

code is more robust in error correction and burst error resistance. It is more suitable for higher order debugging system and efficient data transmission. The multiple LDPC codes are introduced into the optical fiber communication system, and the design of the coder and decoder of the multiple LDPC codes is completed. The main main conclusions of this study are as follows:

First, the encoding method of multivariate LDPC code based on RA structure is introduced in detail. The decoding performance of the maximum BP algorithm in the logarithmic domain, EMS and the modified EMS algorithm in AWGN channel are compared. Finally, a fast decoding method based on FFT-BP and combining four-dimensional modulation is selected. Multi LDPC codes are used in optical fiber communication systems to achieve high coding gain, spectral efficiency and low power consumption, thus satisfying the requirements of high speed and even large capacity transmission of optical fiber.

Second, the FPGA chip used in there is the EP3C55F780C8 of the Cyclone III series device. Using VerilogHDL hardware description language, and combining Quartus II, Matlab and Modelsim development software, the design of encoding and decoding of multiple LDPC codes is realized. Encoding uses an algorithm based on the RA structure, and the decoding uses an algorithm based on the Max-log-BP. The detailed block diagram of each module is given in the design, and the function of each module is explained. The simulation waveform of Modelsim is given. Whether from the resource occupation or the decoding rate, the designed codec of the multiple LDPC codes basically meet the requirements of communication.

Of course, there are still many deficiencies in the design, especially the resource and performance problems in the hardware implementation and the efficiency of the system. The next step of research can be carried out in two aspects: In the hardware implementation phase, the resources of the multi-LDPC code are much more than the resources occupied by the binary LDPC code. Therefore, the problem that needs to be solved urgently is how to enhance the performance and rate of multiple LDPC codes under the condition of limited resources. This needs to improve the existing encoding and decoding algorithms and to maximize the module reuse at the time of design in order to better solve the constraints of resources, performance and speed Although the multivariate LDPC code is introduced into the optical fiber communication system, it is verified only in the theoretical simulation, and has not been built into a specific communication system. The whole optical fiber communication needs to be improved so as to meet the growing demand for high capacity optical fiber transmission.

#### References

- J. Ao, J. Liang, C. Ma, G. Cao, C. Li, Y. Shen: Optimization of LDPC codes for pin-based OOK FSO communication systems. IEEE Photonics Technology Letters 29 (2017), No. 9, 727–730.
- [2] M. Alibakhshi-Kenari, M. Naser-Moghadasi, R. A. Sadeghzadeh, B. S. Virdee, E. Limiti: Traveling-wave antenna based on metamaterial transmission line structure for use in multiple wireless communication applications. AEU International

- Journal of Electronics and Communications 70 (2016), No. 12, 1645–1650.
- [3] L. B. Shen, X. M. Zhao, X. Y. Zhu, L. Wang, Q. J. Liu, B. Z. Wei: A design for a remote condition monitoring system for an optical fibre smart structure based on advanced reduced instruction set computing (RISC) machines (ARM) and general packet radio service (GPRS). Lasers in Engineering 30 (2015), Nos. 1–2, 15–29.
- [4] C. Gong, S. Li, Q. Gao, Z. Xu: Power and rate optimization for visible light communication system with lighting constraints. IEEE Transactions on Signal Processing 63 (2015), No. 16, 4245–4256.
- [5] W. G. MA, Y. CAO, W. WEI, X. H. HEI, J. F. MA: Energy-Efficient collaborative communication for optimization cluster heads selection based on genetic algorithms in wireless sensor networks. International Journal of Distributed Sensor Networks (2015), Article No. 103.
- [6] H. Wang, H. Zhang, F. Yang, S. Song, Y. Chang, C. Bei, K. Yang: Parametric optimization of regenerative organic Rankine cycle system for diesel engine based on particle swarm optimization. Energies 8 (2015), No. 9, 9751–9776.
- [7] C. ABELLÁN, W. AMAYA, D. MITRANI, V. PRUNERI, M. W. MITCHELL: Generation of fresh and pure random numbers for loophole-free bell tests. Physical Review Letters 115 (2015), No. 25, paper 250403.
- [8] D. J. SAUNDERS, J. H. MUNNS, T. F. CHAMPION, C. QIU, K. T. KACZMAREK, E. POEM, P. M. LEDINGHAM, I. A. WALMSLEY, J. NUNN: Cavity-enhanced roomtemperature broadband raman memory. Physical Review Letters 116 (2016), No. 9, paper 090501.
- [9] A. M. Queirós, P. Taylor, A. Cowles, A. Reynolds, S. Widdicombe, H. Stahl: Optical assessment of impact and recovery of sedimentary pH profiles in ocean acidification and carbon capture and storage research. International Journal of Greenhouse Gas Control 38 (2015), 110–120.
- [10] G. DAVID, K. ESAT, S. HARTWEG, J. CREMER, E. CHASOVSKIKH, R. SIGNORELL: Stability of aerosol droplets in Bessel beam optical traps under constant and pulsed external forces. Journal of Chemical Physics 142, (2015), No. 15, paper 154506.
- [11] T. Feng, J. J. McGuirk: Measurements in the annular shear layer of high subsonic and under-expanded round jets. Experiments in Fluids 57 (2016), Article No. 7, paper 25
- [12] A. FIGAROL, J. POURCHEZ, D. BOUDARD, V. FOREST, S. BERHANU, J. M. TULLIANI, J. P. LECOMPTE, M. COTTIER, D. BERNACHE-ASSOLLANT, P. GROSSEAU: Thermal annealing of carbon nanotubes reveals a toxicological impact of the structural defects. Journal of Nanoparticle Research 17 (2015), No. 4, 1–14.
- [13] W. Cai, P. Wang, Y. Li, Y. Zhang, B. Vucetic: Deployment optimization of uniform linear antenna arrays for a two-path millimeter wave communication system. IEEE Communications Letters 19 (2015), No. 4, 669–672.
- [14] R. GANGULA, D. GESBERT, D. GÜNDÜZ: Optimization of energy harvesting miso communication system with feedback. IEEE Journal on Selected Areas in Communications 33 (2015), No. 3, 396–406.
- [15] X. LIU, M. QIU, X. WANG, W. LIU, K. CAI: Energy efficiency optimization for communication of air-based information network with guaranteed timing constraints. Journal of Signal Processing Systems 86 (2017), Nos. 2–3, 299–312.